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1. INTRODUCTION 

The technology can be used for human-physical recognition [1-10], one of which is the recognition 
of the hand that can be applied as a communication tool [11-13]. The previous research used DWT and support 
vector machine (SVM) classification [14]. In this study, they obtained 94% of accuracy where they did cross 
validation five times with 50 data samples from seven actions. Then when 231 samples were used for training 
data and the remaining 119 were used for test data, they obtained an accuracy of 93.27%. Tests also carried 
out with 256256 pixel images with level 5 decomposition which produced an accuracy of 93.14%. DWT can 
provides the information of time and frequency simultaneously and wavelets can be arranged and adapted as 
needed [15]. HMM has the advantage at being able to overcome the problem of evaluation, inference, and 
learning [16]. HMM often used in various applications, an effective learning algorithm, and can handle 
variations in record structure [2]. Referring to the research [17], a static hand gesture recognition using 
the HMM has an average accuracy rate of 93.38%. 

The purpose of this research was to be able to design a hand gesture recognition system based on 
digital images using DWT as feature extraction and HMM as a classification algorithm. Then, test the results 
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and analyze the system performances. The problems contained in this research including how to make 
the system design and simulation of hand gesture recognition for dataset using the DWT method and HMM 
classification, how does the effect in changing the value of input parameters on system performances using 
DWT feature extraction methods and HMM classification, how is the performances, and how the accuracy and 
timing of computing system compared to. The data was self collected using own dataset consist of a collection 
of hand gesture images using smartphone with 13MP resolution. 


2. RESEARCH METHOD 
2.1. Discrete wavelet transform 

DWT method is utilized to conduct hand feature extraction. The method is used to create characteristic 
matrix of an image to represent value of the matrix of the related image. Explanation of the feature extraction 
process using DWT with example in Table 1. First, calculate the average value of pixle row of a hand gesture 
conture image as in Table 2 and the result is in Table 3. Second, calculate the average value of each pixel column 
set in Table 3 by inputing the previous calculation result illustrated in Table 4. Then, 1t would produce an output 
of image conture value sub-band LL, LH, HL, HH illustrated in Table 5. Final process, process of repetitive 
extraction of DWT characteristic will finish if every conture image data is succesfully extracted [18-21]. 


Table 1. 6x6 matrix sample 
Matrix Sample 


135 120 90 98 132 122 
140 126 95 94 121 114 
144 129 88 90 119 111 
129 121 85 78 109 109 
116 106 T3 72 106 99 
98 80 50 53 88 79 


Table 2. Illustration of calculation process of average pixel pair based on the row 
























































Pixel Pair 
135 + 120 90 + 98 132 + 122 135 — 120 90 — 98 132 — 122 
2 2 2 2 2 2 
140 + 126 95 + 94 121+ 114 140 — 126 95 — 94 121 — 114 
2 2 2 2 2 2 
144 + 129 88 + 90 119+ 111 144 — 129 88 — 90 119 — 111 
2 2 2 2 2 2 
129 + 121 85 +78 109 + 109 129 — 121 85 — 78 109 — 109 
2 2 2 2 2 2 
116 + 106 73 +72 106 + 99 116 — 106 73 — 72 106 — 99 
2 2 2 2 2 2 
98 + 80 50 +53 88 + 79 98 — 80 50 — 53 88 — 79 
2 2 2 2 2 2 
Table 3. Illustration of pixel pair based on the row calculation result 
Pixel Pair 
127.5 94 127 7.5 -4 5 
133 94.5 117.5 7 0.5 x9 
136.5 89 115 7.5 -1 4 
125 81.5 109 4 3.5 0 
111 72.5 102.5 5 0.5 3.5 
89 51.5 83.5 9 -1.5 4.5 
Table 4. Illustration of the process of calculating the average pixel pair 
Pixel Pair 
127.5 + 133 94 + 94.5 127 + 117.5 7.5 +7 —4 + 0.5 54335 
2 2 2 2 2 2 
127.5 — 133 94 — 94.5 127—1175 77 405 5-35 
2 2 2 2 2 2 
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Table 5. Illustration of the result from calculating average pixel pair based on the column 


Pixel Pair 
LL 130.25 94.25 122.25 7.25 -1.75 4.25 HL 
130.75 85.25 112 5.75 1.25 2 
100 62 93 7 -0.5 4 
LH -2.75 -0.25 4.75 0.25 -2.25 0.75 HH 
5.75 3.75 3 1.75 -2.25 2 
11 10.5 9.5 -2 l -0.5 


2.2. Hidden Markov models 

Each hidden Markov models are defined by state, probability state, probability of transition, 
probability of emission and the early probability. To describe the entire HMM, the following five elements 
should be elaborated: 
a. N isa state of a model, defined as follows: 


E A (1) 


b. Mis symbol representing observation per state V = {v,..., vy}. The observation has continuous value as 
the M value is infinity. 

c. Probability distribution of transition state A = { a; i} aij stands for state probability at tr] symbolized as 
S;, given when state in time t valued $j. 


aij = PLdt+1 = jlqe = i}, where 1< i,j < N (2) 


The q+ displays the current state. Transition probability should, meet the normal limit. a;; 2 0,1 < ij < N 
and i144 =1, 1<Si<N. 

d. The Observation of symbol probability distribution in each state, B = {b; (k)} where b;(k) serves as 
probability of symbol Vg occurred in state Sj. 


b;(k) = piot = vglqe =j}, 1Sj <N, 1< k <M (3) 


Vg Shows symbol in observation k with alphabet and o, serve as current vector parameter. Following 
stochastic limit must be met b;(k) > 0, 1S j <N, 1< k < M and Xg- b(k) =1, 1 Sj SN. 
e. HMM is the first distribution of state m = {1;}, 1; stands for model probability in state S; in time t = 0 


ti = p{q, =ijand 1<i<N (4) 


In order to carry out further analysis, firstly two basic issues of HMM should be solved as follow: 
a. Evaluation and forward and backward issues 
Calculate the value by inserting scaling function. 
Scaling function 


1 


Ce = SR ant ©) 
— Forward 
— Initialization : 

a,(1) = C,a;(1) 

a,(1) = 1,b;,(0,) (6) 


Recursion: 
1<t<T-11<sjJ<N. 


at+10) = bj (Ot+1) Yin a(i). lij, 
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ât+10) = (sti Cslat+10), (7) 
— Termination : 

log[P(O|A)] = — Xt-1 log Ce (8) 
— Backward 
— Initialization : 


B.(T) = CrBi(T) 
B,(T) =1 (9) 
— Recursion: 


tatei T- ee 


bD = Xa aijbi(Ot+1)Pt+1 C), (10) 


b. Learning issue 
Followings are the step to compute for solving learning issue: 


ai(t)Bi(t) 

GF) = HORE 11 

Vi) = SW abi a 
Gj (t)ajjBi(t+1)b j(Or41) 

(tt) = S—— 12 

Št) Dies Lo Gi(tagjBi(t+1)d j(Or41) 2) 
Next step is re-estimating parameter A, B, and 7: 

eer oO ; 

= ee SN (13) 
5 A= Lizi Yt) aimana Ot=Vk 14 

ij (x) = wat eV) a 
t,=y,0,1<i<N (15) 


The process above should be carried out until a decent value is obtained [22-27]. 


2.3. Image pre-processing 

In this research, a system has been designed to recognizing hand gestures through images. In general, 
the design illustrated in Figure 1. The inputs were training images from a RGB-layered dataset. The inputs 
were testing images from a RGB-layered dataset using DWT as feature extraction method. The final process 
was to train the parameters of forward and backward training images in each class using HMM and the inputs 
were feature vector from training images as seen in Figure 1 (a). In Figure 1 (b) the inputs had been testing 
from a dataset that had a RGB layer then generated a contour image by image resizing and skin color 
segmentation. The last had been processing with DWT method and HMM method, the process that happened 
was calculating the forward parameters and determined the class from the highest probability. 

The image pre-procesding based on Figure 1 was to resize the image to 128x128 pixels then second 
step was change the image from RGB to YcbCr, Blue-layered. In this process, the input was RGB-layered hand 
gesture image. The third step was segment the skin by setting up the pixel value threshold, the final result from 
this process was a segmented image. The fourth step called denoising, where this process had been removing 
the noise in the signal while maintain signal characteristics. The fifth step was to filled up the noise that cannot 
be removed from the previous process. The sixth step was a dilation process to thicken the edge of 
the segmented image from the last process so that the required pixels can be detected. The seventh step was 
the erosion process which would eroded the edge of the segmented image from the last process so that 
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unnecessary pixels can be removed. The output was YCbCr-layered hand gesture contours. The main process 
in pre-processing was a process of separating the background and objects, which in this research was the right 
hand as seen in Figure 2. 






| Feature Extraction| | _ 
Test Image Feature Vector Identified / 
— not identified 











Vector feature 
(feature database) 
multi vector) 


HMM Training 






<O’ 


Feature Extraction 
DWT 













(a) (b) 


Figure 1. System design flowchart; (a) training, (b) testing 





(a) 





Figure 2. Hand gesture; (a) letter A, (b) letter B, (c) letter C, (d) point gesture, and (e) number 5 


2.4. Feature extraction and classification 

In this research, we used the DWT to find the hand features and to create a feature matrix from 
an image to denote the matrix value of the image itself. The result was a contour image value within LL, LH, HL, 
HH subband as an example seen in Table 5. The classification process with HMM as illustrated 
in Figure 3, input was a combined vector from training image’s characteristic vector resulting from the feature 
extraction process using DWT. In addition, HMM required A, B, m, state, and cluster values. It was 
necessary to determine the required state value and calculated the cluster value as the Or observation value by 
seeking the k-means value. The next process was calculating the forward variable, namely the process of 
initialization [10, 28], recursion and termination [2]. 

Before the process, there was an added process of calculate the scaling function. Next was a backward 
algorithm calculation. The process consisted of two stages, the initialization and recursion stages. Calculated 
the variable &,(i, j) and y,(i) based on the variables defined in the previous forward and backward procedures. 
Afterthe four variables was obrtained, reestimated the parameters A, B, and a. The final step was to take 
the highest probability value of the testing image to be used as the final value of the hand gesture classification. 
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Figure 3. Flowchart of classification process; (a) training, (b) testing 


3. RESULTS AND ANALYSIS 

System testing was performed out from self collected dataset with an image that through a resizing 
process measured at 128x128 pixels. The purpose of examining this system was to compare the accuracy, 
system performances, and the best-performed parameters for hand gesture recognition systems. In this research, 
the total image data used was 250 images from dataset. The hand gesture image consists of 5 word classes 
which each consisted of 50 images. 


3.1. Testing the system parameters 

The parameter testing goal was to obtain the results of parameters with the best performance, more 
spesific, the accuracy and timing of the system. 
—  Layer-type parameters impact 

Done by using one type of layer for testing and then DWT feature extraction was performed and 
classified using HMM as shown in Table 6. It can be seen that the best parameter was in the YCbCr layer. 
In Table 6 it appears that the blue layer have the highest accuracy. This was due to the high frequency of pixels, 
from 0 to 45 for high-intensity values at pixels 0 to 231 compared to other types of layers as in Figure 4. 
— Sub-band-type parameters impact 

Done by using layers that had the best performance in the previous test, the blue layer, and DWT 
parameters, that was the four types of sub-band consisting in low-low (LL), low-high (LH), high-low (HL), 
high-high (HH). The performance results in sub-chapter were described in Table 7 and can be seen that 
the best parameter was in the LL sub-band type. The LL sub-band had a smoothest than other sub-band types 
as shown in Figure 5. 
— Decomposition level parameters impact 

The previous test was conducted by analyzing the value of DWT decomposition level parameters 
of level 1, 2, 3, and 4 in the dataset. The tests were carried out with the best parameters in the two previous 
testing parameters, the blue layer and LL sub-band. The performance results were described in Table 8. Graphs 
of characteristics that were influenced by level decomposition were shown in Figure 6. The changes of level 
decomposition resulting in the acquired characteristics had no many characteristic. The smaller 
the decomposition level, the faster the computational time would be. However, this was not the case with 
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accuracy, there were several values for level decomposition parameters that had clear characteristic values to 
able to made different between classes. 


Table 6. Layer-type parameters performances 


Layer Total testing data Total correct data Accuracy (%) Computation time (s) 
Red 100 50 50 46 
Green 100 20 t 41 
Blue 100 68 (68 ) 43 
Grayscale 100 59 9 49 
Binary 100 32 32 56 
YCbCr (Cr) 100 53 53 44 
HSV (V) 100 20 20 40 


Table 7. Sub-band-type parameters performances 
Sub-band Total testing data Total correct data Accuracy (%) Comp. time (s) 


LL 100 68 68 43 
LH 100 20 20 ai 
HL 100 20 20 54 
HH 100 20 20 55 


Layer Blue 


100 











OOO OO! 
0 50 100 150 20 ë 20 
Figure 4. Histogram of blue layer images Figure 5. Illustration of images in sub-band 


Table 8. Performances of decomposition level parameters 
Level _Totaltesting data Total correct data Accuracy(%) Computation time (s) 


1 100 68 68 43 
2 100 20 20 57 
3 100 20 20 54 
4 100 20 20 55 

A A 


600 





Figure 6. Feature values of various levels of decomposition 
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— Mother wavelet parameters impact 

The tests were carried out with four types of mother wavelet parameters: Haar, db3, db5, and 
db7. Tested it with the best parameters in the previous parameters: the Blue layer, LL sub-band, and 
level 1 decomposition. The performance results were listed in Table 9. The best test results obtained from Haar 
mother wavelet. In Figure 7, the graph shows that the different types of mother wavelets cause different forms 
of characteristic in the same class. So, the used of certain mother wavelets in a system can provided 
a uniqueness for each class so that they can be distinguished between each class 


Table 9. Mother wavelet parameters performances 
Mother Wavelet Total testing data Total correct data Accuracy (%) | Computation time (s) 


Haar 100 68 68 43 
db3 100 20 20 43 
db5 100 34 34 43 
db7 100 50 50 44 











a 100 a0 apo 400 eno EDO 70 BDO 0 Wu W0 o a o a o 





oi 100 MIO 300 40a ajg Hg Ti eng 1| 100 200 S00 400 0 BDI To B00 


Figure 7. Feature values of various mother wavelet 


— Amount of cluster parameters impact 

Done to test the cluster parameters used in HMM classification. Clusters that were being analyzed 
ere 50, 100, 200, 400, 800, and 1000. Tested it with the best parameters from previous tests. In Table 10, 
the best number obtained in 800. In Figure 8, it can be seen in the graph that the characteristics of the clusters 
of 50 caused the characteristics at the same type obtained the small accuracy compared to 800 clusters. 


Table 10. Amount of cluster parameters performances 
Cluster total Total testing data Total correct data Accuracy (%) Computation time (s) 


50 100 58 58 52 
100 100 36 36 51 
200 100 45 45 51 
400 100 55 55 52 
800 100 68 68 43 
1000 100 25 25 53 


— Number of state impact 

The next step was to test the state parameters used in the HMM classification to system accuracy and 
computation time. The state that were used: 4, 5, 25, 50, 100, and 150. The best performance results was 
5 states and the rest were listed in Table 11. The best parameters with the number of similar states was 5 states. 
This happened because the concept of HMM that basically broke down the data as many as the desired state. 
So, if the value of the state used is not right, it will make it difficult to identify the test data. 
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Figure 8. Feature value in cluster 50 


Table 11. Number of state impact performances 


State total Total testing data Total correct data Accuracy (%) Computation time (s) 
4 100 30 30 42,4 
5 100 72 72 53 
25 100 49 49 53 
50 100 32 32 58 
100 100 51 51 63 
150 100 42 42 71 


3.2. Testing the data batch 

The data that tested were shown in Table 12. The conclusion was the recognition system with 
the DWT and the HMM can identify well if the training and test images presented was between 60% and 40% 
of all data in each class. 


Table 12. Data batch testing performances 
Data (Training- Test) Accuracy (%) Computation time (s) 
22 


40-10 
30-20 ip: 53 
25-25 20 a, 


3.3. Classification testing 

Classification testing was done by comparing the accuracy and computation time of two classification 
methods, K-Nearest Neighbor (K-NN) and HMM. The classification data was taken from the training data to 
the training data and training data to the test data as shown in Table 13. Based on Table 12, HMM had a lower 
accuracy when tested a training to training data, when compared to training to test data. This happened due to 
the percentage of data when the training tested with training data is 50-50%. Whereas, when tested training to 
the test data had a presentation of 60-40%. Based on Table 13, it can be seen that the test was also done with 
other datasets with accuracy and computational time better than a performance with Marcel static hand posture 
database [29, 30]. The dataset had a lower performance compared to the Author dataset that has a resize image 
measured at 76x66 pixels which in this research was the right hand as seen in Figure 9. 


Table 13. Classification testing performance on classification methods 


Dataset Classification a poe, (%) Computation ne) 
Training Test Training Test 
Dataset Sebastien K-NN 100% 100% 3.78 PATI 
HMM 38% 58% 2.6 2.01 
Writer’s Dataset K-NN 100% 100% 58 40 
HMM 55% 72% 68 53 
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Picture Sebatien’s dataset of hand gesture; 
(a) Letter A, (b) Letter B, (c) Letter C, (d) Point Gesture, (e) Number 5 


4. CONCLUSION 

This paper proposed a hand gesture recognition system that has 5 types of gestures: letter A, letter B, 
letter C, point, and number 5 (five). The best parameters are blue layer, low-low sub-band, level 1 
decomposition, Haar mother wavelet on DWT parameters, 800 cluster numbers and 5 state states on the HMM 
parameter. The accuracy and computation time outcome from the system were 72% and 53 seconds 
respectively. The best amount of data tested is on 30 training images and 20 test images. Layers that had high 
accuracy would have a good contrast and brightness ratio. The dataset image had a high contrast and brightness 
on the Blue layer due to the high frequency of pixels which are 0 to 45 for high-intensity values at pixels 0 to 
231 compared to other layer. 

DWT had three test parameters they are sub-band type, decomposition level, and mother wavelet. 
The sub-band parameter was processed to obtain a smooth image characteristic in the LL sub-band. 
Decomposition level parameters was the process of converting images into a simple form to obtain 
unique characteristics of a good image. Next, different type of mother wavelet caused the uniqueness in 
the characteristic. The HMM classification had two test parameters, the clusters and the states. Effect of cluster 
parameters was taking the features to be used. The cluster values must be appropriated determinate because 
the characteristic values of each class will be similar. In testing the classification for training to training had 
a lower accuracy of 55% compared with testing training to data which is 72%. This happened due to 
the percentage of data when the training tested with other training data was 50-50%. Whereas, when testing 
training data to the test data had a presentation of 60%-40%. Cogitated and concluded to create our own dataset 
because when Sebastien dataset was used, it just had the accuracy at 58% with the image size of 7666 pixel. 
Further, it had 2™ level decomposition and db5 mother wavelet. The mentioned caused by feature extraction 
process with 2™ level DWT and classification with HMM encountered three times the compression process. 
The consequences were the gestures taken from the images was so small so its harder to classified it. Hence, 
to made the accuracy higher we produced our own dataset with a good brightess and contrast value. Thereafter, 
the resolution was boosted to 128128 pixels resulting the accuracy jumped up by 14% to 72%. 
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